15 research outputs found
Towards Practical Control of Singular Values of Convolutional Layers
In general, convolutional neural networks (CNNs) are easy to train, but their
essential properties, such as generalization error and adversarial robustness,
are hard to control. Recent research demonstrated that singular values of
convolutional layers significantly affect such elusive properties and offered
several methods for controlling them. Nevertheless, these methods present an
intractable computational challenge or resort to coarse approximations. In this
paper, we offer a principled approach to alleviating constraints of the prior
art at the expense of an insignificant reduction in layer expressivity. Our
method is based on the tensor-train decomposition; it retains control over the
actual singular values of convolutional mappings while providing structurally
sparse and hardware-friendly representation. We demonstrate the improved
properties of modern CNNs with our method and analyze its impact on the model
performance, calibration, and adversarial robustness. The source code is
available at: https://github.com/WhiteTeaDragon/practical_svd_convComment: Published as a conference paper at NeurIPS 202
TT-NF: Tensor Train Neural Fields
Learning neural fields has been an active topic in deep learning research,
focusing, among other issues, on finding more compact and easy-to-fit
representations. In this paper, we introduce a novel low-rank representation
termed Tensor Train Neural Fields (TT-NF) for learning neural fields on dense
regular grids and efficient methods for sampling from them. Our representation
is a TT parameterization of the neural field, trained with backpropagation to
minimize a non-convex objective. We analyze the effect of low-rank compression
on the downstream task quality metrics in two settings. First, we demonstrate
the efficiency of our method in a sandbox task of tensor denoising, which
admits comparison with SVD-based schemes designed to minimize reconstruction
error. Furthermore, we apply the proposed approach to Neural Radiance Fields,
where the low-rank structure of the field corresponding to the best quality can
be discovered only through learning.Comment: Preprint, under revie
Breathing New Life into 3D Assets with Generative Repainting
Diffusion-based text-to-image models ignited immense attention from the
vision community, artists, and content creators. Broad adoption of these models
is due to significant improvement in the quality of generations and efficient
conditioning on various modalities, not just text. However, lifting the rich
generative priors of these 2D models into 3D is challenging. Recent works have
proposed various pipelines powered by the entanglement of diffusion models and
neural fields. We explore the power of pretrained 2D diffusion models and
standard 3D neural radiance fields as independent, standalone tools and
demonstrate their ability to work together in a non-learned fashion. Such
modularity has the intrinsic advantage of eased partial upgrades, which became
an important property in such a fast-paced domain. Our pipeline accepts any
legacy renderable geometry, such as textured or untextured meshes, orchestrates
the interaction between 2D generative refinement and 3D consistency enforcement
tools, and outputs a painted input geometry in several formats. We conduct a
large-scale study on a wide range of objects and categories from the
ShapeNetSem dataset and demonstrate the advantages of our approach, both
qualitatively and quantitatively. Project page:
https://www.obukhov.ai/repainting_3d_asset
Learning to Relate Depth and Semantics for Unsupervised Domain Adaptation
We present an approach for encoding visual task relationships to improve
model performance in an Unsupervised Domain Adaptation (UDA) setting. Semantic
segmentation and monocular depth estimation are shown to be complementary
tasks; in a multi-task learning setting, a proper encoding of their
relationships can further improve performance on both tasks. Motivated by this
observation, we propose a novel Cross-Task Relation Layer (CTRL), which encodes
task dependencies between the semantic and depth predictions. To capture the
cross-task relationships, we propose a neural network architecture that
contains task-specific and cross-task refinement heads. Furthermore, we propose
an Iterative Self-Learning (ISL) training scheme, which exploits semantic
pseudo-labels to provide extra supervision on the target domain. We
experimentally observe improvements in both tasks' performance because the
complementary information present in these tasks is better captured.
Specifically, we show that: (1) our approach improves performance on all tasks
when they are complementary and mutually dependent; (2) the CTRL helps to
improve both semantic segmentation and depth estimation tasks performance in
the challenging UDA setting; (3) the proposed ISL training scheme further
improves the semantic segmentation performance. The implementation is available
at https://github.com/susaha/ctrl-uda.Comment: Accepted at CVPR 2021; updated results according to the released
source cod
DiffDreamer: Consistent Single-view Perpetual View Generation with Conditional Diffusion Models
Perpetual view generation -- the task of generating long-range novel views by
flying into a given image -- has been a novel yet promising task. We introduce
DiffDreamer, an unsupervised framework capable of synthesizing novel views
depicting a long camera trajectory while training solely on internet-collected
images of nature scenes. We demonstrate that image-conditioned diffusion models
can effectively perform long-range scene extrapolation while preserving both
local and global consistency significantly better than prior GAN-based methods.
Project page: https://primecai.github.io/diffdreamer
Quantum Imaging with Incoherently Scattered Light from a Free-Electron Laser
The advent of accelerator-driven free-electron lasers (FEL) has opened new
avenues for high-resolution structure determination via diffraction methods
that go far beyond conventional x-ray crystallography methods. These techniques
rely on coherent scattering processes that require the maintenance of
first-order coherence of the radiation field throughout the imaging procedure.
Here we show that higher-order degrees of coherence, displayed in the intensity
correlations of incoherently scattered x-rays from an FEL, can be used to image
two-dimensional objects with a spatial resolution close to or even below the
Abbe limit. This constitutes a new approach towards structure determination
based on incoherent processes, including Compton scattering, fluorescence
emission or wavefront distortions, generally considered detrimental for imaging
applications. Our method is an extension of the landmark intensity correlation
measurements of Hanbury Brown and Twiss to higher than second-order paving the
way towards determination of structure and dynamics of matter in regimes where
coherent imaging methods have intrinsic limitations
Tensor Decompositions in Deep Learning
Tensor Decompositions is a subdomain of multilinear algebra concerned with dimensionality reduction and analysis of multi-dimensional arrays (tensors). The field has numerous applications in physics, chemistry, life sciences, and recently, machine learning, computer vision, and graphics. Despite the maturity of the field, much progress happened in the last years due to affordable parallel compute, driving empirical research. Deep Learning is a young subdomain of machine learning concerned with fitting deep, non-linear parametric models in a non-convex optimization setting with abundant data. The tipping point of interest in deep learning happened when a neural network (AlexNet) set a record-high score on a popular image classification benchmark (ImageNet), thus promising to solve long-standing computer vision problems. Over the past years, most breakthroughs in deep learning happened by finding smarter ways to increase model size and complexity. However, the need to deploy deep models on edge devices, such as for computational photography on mobile phones, has set a new direction for finding lean models. On the other hand, many high-potential deep learning techniques, such as Neural Radiance Fields (NeRF), or vision transformers, leave a huge margin for improvement upon inception. In this thesis, we investigate the use of tensor decompositions in the context of modern deep learning techniques. We aim to improve various types of efficiency: memory footprint and runtime performance, measured in parameters and floating-point operations (FLOPs), respectively. We begin by exploring neural network layer compression schemes and propose a tensorized representation with a basis tensor shared among layers and per-layer coefficients. Subsequently, we study the manifold of Tensor Train (TT) of fixed rank in the context of parameterizing layers of Generative Adversarial Networks (GANs) and demonstrate the ability to compress networks while maintaining the stability of training. Finally, we utilize TT-parameterization to learn compressed NeRFs and devise sampling schemes with support for automatic differentiation to facilitate training. Unlike most previous works on tensor decompositions, we treat decompositions as models in the deep learning sense and update their parameters through backpropagation and optimization. Like prior art, tensorized formats admit to certain algebraic operations, making them an appealing entity at the intersection of two prominent research directions